Skip to content

Preprocess the dataframe X to remove newlines#430

Merged
tramora merged 1 commit intodevfrom
413-multi-line-fields-in-dataframes
Jul 9, 2025
Merged

Preprocess the dataframe X to remove newlines#430
tramora merged 1 commit intodevfrom
413-multi-line-fields-in-dataframes

Conversation

@tramora
Copy link
Copy Markdown
Collaborator

@tramora tramora commented Jul 7, 2025

Fixes issue #413


TODO Before Asking for a Review

  • [X ] Rebase your branch to the latest version of dev (or main for release PRs)
  • [ X] Make sure all CI workflows are green
  • When adding a public feature/fix: Update the Unreleased section of CHANGELOG.md (no date)
  • [ X] Self-Review: Review "Files Changed" tab and fix any problems you find
  • API Docs (only if there are changes in docstrings, rst files or samples):
    • [X ] Check the docs build without warning: see the log of the API Docs workflow
    • [ X] Check that your changes render well in HTML: download the API Docs artifact and open index.html
    • If there are any problems it is faster to iterate by building locally the API Docs

@tramora tramora force-pushed the 413-multi-line-fields-in-dataframes branch from 743360d to 6a84438 Compare July 7, 2025 13:14
Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor issues (see the comments).

@tramora tramora force-pushed the 413-multi-line-fields-in-dataframes branch from 6a84438 to d9d7363 Compare July 7, 2025 15:54
@tramora tramora requested a review from popescu-v July 7, 2025 16:12
Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One remaining doc comment.

…ewlines

To avoid any code duplication,
the best place to perform the preprocessing
is just before handing over to Khiops (when writing the csv file)
@tramora tramora force-pushed the 413-multi-line-fields-in-dataframes branch from d9d7363 to 3675d28 Compare July 8, 2025 23:21
@tramora tramora requested a review from popescu-v July 9, 2025 09:06
so that Khiops can handle them.
Hence, multi-line records are preprocessed:
carriage returns / line feeds are replaced
with blank spaces before being handed over to Khiops.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

A memory penalty is incurred, as a new dataframe is generated following the preprocessing.

Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@tramora tramora merged commit 359af8d into dev Jul 9, 2025
155 of 159 checks passed
@tramora tramora deleted the 413-multi-line-fields-in-dataframes branch July 9, 2025 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants